Proportionate gradient updates with PercentDelta

نویسنده

Sami Abu-El-Haija

چکیده

Deep Neural Networks are generally trained using iterative gradient updates. Magnitudes of gradients are affected by many factors, including choice of activation functions and initialization. More importantly, gradient magnitudes can greatly differ across layers, with some layers receiving much smaller gradients than others. causing some layers to train slower than others and therefore slowing down the overall convergence. We analytically explain this disproportionality. Then we propose to explicitly train all layers at the same speed, by scaling the gradient w.r.t. every trainable tensor to be proportional to its current value. In particular, at every batch, we want to update all trainable tensors, such that the relative change of the L1-norm of the tensors is the same, across all layers of the network, throughout training time. Experiments on MNIST show that our method appropriately scales gradients, such that the relative change in trainable tensors is approximately equal across layers. In addition, measuring the test accuracy with training time, shows that our method trains faster than other methods, giving higher test accuracy given same budget of training steps.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Impact of Novel Incorporation of CT-based Segment Mapping into a Conjugated Gradient Algorithm on Bone SPECT Imaging: Fundamental Characteristics of a Context-specific Reconstruction Method

Objective(s): The latest single-photon emission computed tomography (SPECT)/computed tomography (CT) reconstruction system, referred to as xSPECT Bone™, is a context-specific reconstruction system utilizing tissue segmentation information from CT data, which is called a zone map. The aim of this study was to evaluate theeffects of zone-map enhancement incorporated into the ordered-subset conjug...

متن کامل

A SET−MEMBERSHIP APPROACH TO NORMALIZED PROPORTIONATE ADAPTATION ALGORITHMS (TueAmOR12)

Proportionate adaptive filters can improve the convergence speed for the identification of sparse systems as compared to their conventional counterparts. In this paper, the idea of proportionate adaptation is combined with the framework of set−membership filtering (SMF) in an attempt to derive novel computationally efficient algorithms. The resulting algorithms attain an attractive faster conve...

متن کامل

New Proportionate Affine Projection Algorithm

A new proportionate-type affine projection algorithm with intermittent update of the weight coefficients is proposed. It takes into account the “history” of the proportionate factors and uses a fast recursive filtering procedure. Also, the effect of using dichotomous coordinate descent iterations is investigated. Simulation results indicate that the proposed algorithm has improved performance a...

متن کامل

Sparse Communication for Distributed Gradient Descent

We make distributed stochastic gradient descent faster by exchanging sparse updates instead of dense updates. Gradient updates are positively skewed as most updates are near zero, so we map the 99% smallest updates (by absolute value) to zero then exchange sparse matrices. This method can be combined with quantization to further improve the compression. We explore different configurations and a...

متن کامل

Towards health equity: a framework for the application of proportionate universalism

INTRODUCTION The finding that there is a social gradient in health has prompted considerable interest in public health circles. Recent influential works describing health inequities and their causes do not always argue cogently for a policy framework that would drive the most appropriate solutions differentially across the social gradient This paper aims to develop a practice heuristic for prop...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/1708.07227 شماره

صفحات -

تاریخ انتشار 2017

Proportionate gradient updates with PercentDelta

نویسنده

چکیده

منابع مشابه

Impact of Novel Incorporation of CT-based Segment Mapping into a Conjugated Gradient Algorithm on Bone SPECT Imaging: Fundamental Characteristics of a Context-specific Reconstruction Method

A SET−MEMBERSHIP APPROACH TO NORMALIZED PROPORTIONATE ADAPTATION ALGORITHMS (TueAmOR12)

New Proportionate Affine Projection Algorithm

Sparse Communication for Distributed Gradient Descent

Towards health equity: a framework for the application of proportionate universalism

عنوان ژورنال:

اشتراک گذاری